rubric={raw:10}
git add them.1st activity workseet: Activity 1 Worksheet
2nd activity workseet: Activity 2 Markup Doc
Before the lab, read David Robinson's blog post about Donald Trump's Twitter account during the 2016 campaign race at least once. The entire lab is based around this post.
1a: The hypothesis of the article is that data and text analysis can help determine that a different person is tweeting Donald Trump's tweets on Android and iPhone based on the tweet patterns and content.
rubric={reasoning:8}
Report on your reactions to the article. In particular:
Note: A couple sentences is sufficient for each of the three points above. No need to write a lengthy response.
1b:
Confusion: As a frequent twitter user, most of the twitter wrangling made sense, but I think it would have clarified the article to briefly explain the mechanics of hashtags, quotetweets, and image linking. Furthermore, as a data science student I was quite interested by the sentiment analysis relative percent increase in word emotion rates between iPhone and Android, but there was no explanation or code presented on how this was done.
Impressed: I was impressed by how fast and efficiently the author was able to scrape twitter data, and that they were kind and thoughtful enough to create a data dump file for users who didn't want to or were unable to set up twitter authentication. Also impressive was the authors command of the R language, as a newer R user it was enlightening to see some advanced and clean R code examples!
Impression: Overall, I quite enjoyed the article and found it really interesting. The author was able to take a topic familiar to most people and demonstrate a lot of data analysis to support their hypothesis in a fairly short article.
rubric={reasoning:6}
Based on your reactions to David Robinson's post, what is one thing that you think is worth emulating when communicating the results of data analysis? What is one thing worth avoiding?
Recommended answer length: 100 words. Max answer length: 200 words.
1c:
I really liked David Robinson's general brevity and clear messaging when announcing the conclusions of each section of analysis, and want to remind myself to emulate this in the future. I appreciated that he didn't spend an inordinate amount of time on pontification and purple prose about his results. As a blog post, I think this is important because the audience wants the article pacing quick and entertaining, and not an extended deep dive on the content. However, as mentioned earlier in 1a, I think a quick summary before introducing the analysis of some of the more esoteric mechanics of twitter might have been beneficial. Even as a heavy twitter user, I found the reasoning and conclusions behind his tweet wrangling a little confusing. I think it’s important to avoid relying on assuming advanced knowledge from your audience.
rubric={reasoning:6}
Who do you think is the intended audience for David Robinson's post? Describe the target reader in as much detail as you can. Is it targeted at people of specific age group? Nationality? Political inclination? Make sure your answer includes the assumed level of data science skills for the target reader.
Note: In part (b) you'll justify your claims.
2a: I believe the main target audience of David Robinson's post is Data Science practitioners of moderate skill level who would be interested in learning more about sentiment analysis. The article is a subtle advertisement and demonstration for the new R package, tidytext, David and his collaborator Julia Silge have developed. I would argue it's moderate, because beginner data scientists might not be as comfortable with the assumed scraping knowledge, and advanced data scientists would already have their own sentiment analysis workflows. Because of the choice of subject in Donald Trump, we can infer that the audience would need to be quite familiar with Trump, and therefore more likely American, or at least North American. The use and assumed knowledge of the twitter platform and politics theme would indicate that the target age demographic would mostly match the intersection of these groups, which I would estimate to be about young adults to middle-age adults. Finally, while the author tries to stay politically neutral for much of the article, there are hints that the political inclination of the target audience would tend to lean left from the negative connotations of the descriptions of Donald Trump.
rubric={reasoning:6}
How do you know what the audience is? For your claims in part (a), give specific examples from the original text that justify your claims. 1-2 sentences per claim is suffucient.
2b
North American nationality: When the author states "These tweets certainly sound like the Trump we all know", they are assuming the audience is very familiar with trump media. While this does not exclude other nationalities, North American and American audiences especially are exposed more frequently to Trump's speech patterns.
Age demographic Young adults to middle-age: When the author opens the article stating "I don’t normally post about politics" it clearly indicates the politics topic, and further in the introduction the author assumes twitter familiarity by casually referencing "using hashtags, links, and retweets in distinct ways". Interest in politics increasing with age has been heavily studied with an example here, and twitter demographics tend to skew slightly older than other social media platforms as indicated here, which gives us an age demographic intersection of young adults to middle-age adults.
Left political inclination: When the author ends the article by highlighting "Like Tony Schwartz, will they one day regret their involvement?" about the tweet ghost-writer, they are concerned about the negative effects of Trump which would appeal to people on the opposite of Trump's political ideology.
rubric={reasoning:8,writing:8}
David Robinson's blog post definitely assumes some level of data science knowledge. However, non-data scientists might be interested in his findings as well. Your task is to rewrite a much shorter version of the blog post, this time targeted it toward a reader without data science knowledge. You can assume the reader has the basic knowledge required to understand the argument - they know what an iPhone is, what Android is; they know who Donald Trump is and some basic facts about him; they know what Twitter is and what a tweet is and that @realDonaldTrump is Trump's (former) Twitter handle. However, they have no background in programming, statistics, or data science.
Recommended length: 300 words. Maximum length: 500 words. The original blog post is around 2000 words, so keep in mind that your version needs to be much shorter than the original.
Note: don't refer to David Robinson or David Robinson's post in your post. For the purposes of this exercise, you are pretending to be David Robinson writing a separate post for a different audience. You are welcome to use the first person (e.g. "I analyzed the data") but you don't have to.
Note: while this would normally be considered plagiarism, for this exercise you are welcome to copy from David Robinson's post. In general this is probably a bad idea, since the new audience will probably require a complete rewrite, but we will leave this option open to you for both David Robinson's text and visualizations. If you include visualizations, make sure they render properly in your HTML on Canvas so that the grader can see them. However, again, think carefully about whether including any original material actually makes sense for your new audience.
I recently discovered a tweet that asks an interesting question: Are Trump's Android and iPhone tweets different people?
The premise is that different tweeting patterns and tweet content between devices indicate separate people, as noted anecdotally by others. People have noted that the Android content mimics Trump's verbal speech patterns, is more negative, uses fewer Twitter mechanics, and that Trump himself uses an Android for tweets.
Conversely, his iPhone tweets tend to report state events, public relations content, and contain more hashtags and linked imagery.
I was excited to perform a data analysis to quantitatively measure the differences between Android and iPhone tweet data to help determine if these are in fact different people!
The first metric I investigated was whether Android and iPhone tweets were sent at diffent times of the day. From the 762 Android tweets and 628 iPhone tweets available, we see the following time breakdown by platform.
We can see a clear time difference between when the Trump account posts with Android versus the iPhone. The iPhone posts occur during the mid-morning and early evening timeframes, which would hint at work hours. In contrast, the Android platform tweets occur frequently in the early morning and late at night, which coincides with recreational use hours.
Another key observation is that the iPhone and Android tweets apply very different usage patterns of twitter features such as retweets, hashtags, and picture links. One particularly distinct artifact is the Trump account's anachronistic behavior of “manually retweeting” people by copy-pasting their tweets, then surrounding them with quotation marks.
We can see in the plot below, that almost all of these 'manual retweets' happen on Android, and account for about a third of his Android tweets!
Another twitter feature usage discrepancy between Android and iPhone is the use of pictures or links, demonstrated below:
Here we see that tweets from the iPhone were 38 times as likely to contain either a picture or a link!
Finally, by looking at the word content differences between platforms we can measure differences in sentiment, which is a description of emotion or word connotation. By comparing word sentiment by platform below, I show differences in sentiment between text in Android and iPhone tweets.
Here, we can see that words with negative sentiments (sadness, disgust) occur much more commonly than positive sentiments (joy, trust) in Android tweets!
My analysis concludes that the Android and iPhone tweets are clearly from different people, posting during different times of day and using hashtags, links, and retweets in distinct ways. What’s more, we can see that the Android tweets are angrier and more negative, while the iPhone tweets tend to be benign announcements and pictures.
When you are ready to submit your assignment do the following:
Kernel -> Restart Kernel and Run All Cells....html format using the convert_notebook() function below or by File -> Export Notebook As... -> Export Notebook to HTMLsubmit() below to go through an interactive submission process to Canvas.# convert_notebook("lab1.ipynb", "html") # uncomment and run when you want to try convert your notebook (or you can convert manually from the File menu)
# submit(course_code=53666, token=False) # uncomment and run when ready to submit to Canvas